skip to main content


Search for: All records

Creators/Authors contains: "Cheung, Alvin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated. 
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  2. Exploiting the relationships among data is a classical query optimization technique. As persistent data is increasingly being created and maintained programmatically, prior work that infers data relationships from data statistics misses an important opportunity. We present Coco, the first tool that identifies data relationships by analyzing database-backed applications. Once identified, Coco leverages the constraints to optimize the application's physical design and query execution. Instead of developing a fixed set of predefined rewriting rules, Coco employs an enumerate-test-verify technique to automatically exploit the discovered data constraints to improve query execution. Each resulting rewrite is provably equivalent to the original query. Using 14 real-world web applications, our experiments show that Coco can discover numerous data constraints from code analysis and improve real-world application performance significantly. 
    more » « less
  3. Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification. 
    more » « less
  4. Research on transaction processing has made significant progress towards improving performance of main memory multicore OLTP systems under low contention. However, these systems struggle on workloads with lots of conflicts. Partitioned databases (and variants) perform well on high contention workloads that are statically partitionable, but time-varying workloads often make them impractical. To- wards addressing this, we propose Strife—a novel transac- tion processing scheme that clusters transactions together dynamically and executes most of them without any con- currency control. Strife executes transactions in batches, where each batch is partitioned into disjoint clusters with- out any cross-cluster conflicts and a small set of residuals. The clusters are then executed in parallel with no concur- rency control, followed by residuals separately executed with concurrency control. Strife uses a fast dynamic clustering al- gorithm that exploits a combination of random sampling and concurrent union-find data structure to partition the batch online, before executing it. Strife outperforms lock-based and optimistic protocols by up to 2× on high contention workloads. While Strife incurs about 50% overhead relative to partitioned systems in the statically partitionable case, it performs 2× better when such static partitioning is not possible and adapts to dynamically varying workloads. 
    more » « less
  5. null (Ed.)
    Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, reduces compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data management of this perceptual information. In this paper, we propose Vignette, a compression technique and storage manager for perception-based video compression in the cloud. Vignette complements off-the-shelf compression software and hardware codec implementations. Vignette's compression technique uses a neural network to predict saliency information used during transcoding, and its storage manager integrates perceptual information into the video storage system. Our results demonstrate the benefit of embedding information about the human visual system into the architecture of cloud video storage systems. 
    more » « less